Skip to content

htlcswitch: separate onchain and offchain intercpeted HTLCs cleanly#10895

Merged
ziggie1984 merged 5 commits into
lightningnetwork:masterfrom
ziggie1984:fix-onchain-interceptor-held-htlc
Jun 26, 2026
Merged

htlcswitch: separate onchain and offchain intercpeted HTLCs cleanly#10895
ziggie1984 merged 5 commits into
lightningnetwork:masterfrom
ziggie1984:fix-onchain-interceptor-held-htlc

Conversation

@ziggie1984

@ziggie1984 ziggie1984 commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Change Description

Fixes #10892

Look at the first commit which shows via itests the failure cases of the old Interceptor Implementation

The second commit introduces a new interface and distingishes between onchain and offchain HTLC for the interceptor. It does not change any public interface.

@ziggie1984 ziggie1984 self-assigned this Jun 11, 2026
@github-actions github-actions Bot added the severity-critical Requires expert review - security/consensus critical label Jun 11, 2026
@github-actions

Copy link
Copy Markdown

PR Severity: CRITICAL

Highest-severity file determines level | 4 non-test files | 467 lines changed

Critical (2 files):

  • htlcswitch/held_htlc_set.go - HTLC forwarding state machine
  • htlcswitch/interceptable_switch.go - HTLC forwarding/payment routing interceptable switch

Medium (1 file):

  • witness_beacon.go - Root-level Go file

Low (4 files - excluded from severity bump counts):

  • docs/release-notes/release-notes-0.21.1.md
  • htlcswitch/held_htlc_set_test.go
  • itest/list_on_test.go
  • itest/lnd_forward_interceptor_test.go

Analysis

This PR modifies core htlcswitch package files. Changes to held_htlc_set.go and interceptable_switch.go affect the HTLC forwarding and payment routing state machine, warranting expert review.

Bump check: 4 non-test files (threshold >20), 467 lines changed (threshold >500), 1 critical package. No bump applied.

To override, add a severity-override-{critical,high,medium,low} label.
<!-- pr-severity-bot -->

@ziggie1984 ziggie1984 added this to the v0.21.1 milestone Jun 11, 2026
@ziggie1984 ziggie1984 added this to v0.21 Jun 11, 2026
@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the HTLC interception mechanism to accurately manage HTLCs that transition from off-chain forwarding to on-chain resolution, particularly in scenarios involving channel force-closures. By introducing distinct handling for off-chain and on-chain held HTLCs, the changes prevent incorrect auto-failing or misdirection of settlement attempts, thereby enhancing the robustness and reliability of HTLC management within the system.

Highlights

  • Differentiated HTLC Holding: The heldHtlcSet now distinguishes between offChainHeld and onChainHeld entries, allowing for different behaviors based on whether an HTLC is in the off-chain forwarding flow or the on-chain contractcourt flow.
  • Correct On-Chain HTLC Resolution: On-chain held HTLCs are now settle-only and expire by pruning local interceptor state, preventing issues like premature auto-failing after a restart or incorrect routing of settlement messages to the link mailbox instead of the witness beacon.
  • Dynamic HTLC State Transition: If an HTLC is initially held off-chain and then re-offered on-chain (e.g., due to a channel force-close), the stored entry in the heldHtlcSet is replaced to ensure future settlement attempts correctly reach the witness beacon.
  • Robust Resolution Logic: HTLC resolution now removes an entry from the held set only after the selected action (settle, fail, resume) successfully completes, improving atomicity and reliability.
  • Integration Test Coverage: New integration tests have been added to cover critical scenarios, including settling an on-chain intercepted HTLC after a node restart and settling an HTLC that transitions from off-chain to on-chain without a restart.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the heldHtlcSet to distinguish between off-chain and on-chain held HTLCs using a new heldEntry interface, ensuring proper handling of resolutions and expirations for both flows. It also updates the InterceptableSwitch and witness_beacon to support this, and adds comprehensive unit and integration tests. A critical deadlock risk was identified in witness_beacon.go where SubscribeUpdates holds a lock while making a blocking call to the interceptor, which can conflict with the switch's event loop attempting to acquire the same lock during resolution; releasing the lock early is recommended.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread witness_beacon.go Outdated
@ziggie1984 ziggie1984 force-pushed the fix-onchain-interceptor-held-htlc branch 2 times, most recently from cf2e20c to 02347b0 Compare June 11, 2026 01:35
@ziggie1984

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the heldHtlcSet to track held HTLCs as either off-chain or on-chain entries using a new heldEntry interface. This change ensures that on-chain re-offers can replace old off-chain holds, allowing settlements to reach the witness beacon after an incoming channel force-closes. Additionally, integration tests are added to verify on-chain settlement scenarios, and minor fixes are made to the witness beacon subscription flow. The reviewer's feedback correctly points out a style guide violation where the newHeldHtlcSet function lacks a documentation comment.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread htlcswitch/held_htlc_set.go
@ziggie1984 ziggie1984 force-pushed the fix-onchain-interceptor-held-htlc branch from 02347b0 to 6495243 Compare June 11, 2026 01:50
@saubyk saubyk removed this from v0.21 Jun 11, 2026
@saubyk saubyk added this to lnd v0.22 Jun 11, 2026
@saubyk saubyk removed this from the v0.21.1 milestone Jun 11, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in lnd v0.22 Jun 11, 2026
@saubyk saubyk added this to the v0.22.0 milestone Jun 11, 2026
@saubyk saubyk moved this from Backlog to In progress in lnd v0.22 Jun 11, 2026
@ziggie1984 ziggie1984 modified the milestones: v0.22.0, v0.21.1 Jun 11, 2026
@saubyk saubyk added this to v0.21 Jun 11, 2026
@saubyk saubyk removed this from lnd v0.22 Jun 11, 2026
@saubyk saubyk moved this to In progress in v0.21 Jun 11, 2026
@ziggie1984 ziggie1984 force-pushed the fix-onchain-interceptor-held-htlc branch 4 times, most recently from 5e3c3c3 to 424d1a8 Compare June 11, 2026 20:35
@ziggie1984

Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the heldHtlcSet to support both off-chain and on-chain held HTLCs via a new heldEntry interface, resolving an issue where on-chain forward interceptor settlements failed to reach the witness beacon after a channel force-close. The changes also include updates to InterceptableSwitch and witness_beacon.go, comprehensive unit and integration tests, and updated documentation. Feedback is provided to add a documentation comment to newHeldHtlcSet to align with the repository's style guide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread htlcswitch/held_htlc_set.go
@saubyk saubyk added this to the v0.21.1 milestone Jun 25, 2026
@ziggie1984 ziggie1984 force-pushed the fix-onchain-interceptor-held-htlc branch 2 times, most recently from 4768feb to ab5dbca Compare June 25, 2026 16:10
@ziggie1984 ziggie1984 requested a review from Roasbeef June 25, 2026 16:10
@ziggie1984 ziggie1984 added the backport-v0.20.x-branch This label is used to trigger the creation of a backport PR to the branch `v0.20.x-branch`. label Jun 25, 2026
@gijswijs gijswijs self-requested a review June 26, 2026 11:25
@yyforyongyu

Copy link
Copy Markdown
Member

I think the promotion path should notify the connected interceptor with the updated on-chain intercept.

Step-by-step case:

  1. Bob runs an external HtlcInterceptor service.
  2. Alice sends a payment through Bob.
  3. Bob intercepts the incoming HTLC off-chain before normal forwarding.
  4. Bob sends the interceptor client an off-chain intercept request for the circuit key.
  5. That request includes auto_fail_height = incoming_timeout - cltvRejectDelta.
  6. The interceptor service records this circuit and deadline.
  7. The interceptor service holds the HTLC because it is waiting on external business logic, a recipient, or a preimage source outside lnd.
  8. Alice force-closes the incoming Alice-Bob channel while the HTLC is still held.
  9. Bob's contractcourt starts resolving the force-closed channel.
  10. For the held HTLC, contractcourt subscribes to the witness beacon and waits for a matching preimage.
  11. The witness beacon re-offers the same HTLC to the interceptor through the on-chain interceptor path.
  12. This new on-chain intercept has the same circuit key but different semantics.
  13. Off-chain semantics were: RESUME, FAIL, SETTLE, or hold are possible until auto-fail height.
  14. On-chain semantics are: only SETTLE is useful; RESUME and FAIL no longer apply.
  15. The on-chain useful deadline is the HTLC refund timeout, not the old off-chain auto-fail height.
  16. This PR correctly replaces Bob's internal held entry from off-chain to on-chain.
  17. Because the entry was already held, interceptOnChain sets wasHeld == true.
  18. Because wasHeld == true, this PR does not send the updated on-chain intercept to the connected interceptor client.
  19. The interceptor client still only has the old off-chain request.
  20. The interceptor client still believes the relevant deadline is incoming_timeout - cltvRejectDelta.
  21. The interceptor client does not learn that the HTLC is now on-chain and settle-only.
  22. The interceptor client does not learn that it can still settle until RefundTimeout.
  23. The old off-chain auto-fail height arrives.
  24. A deadline-aware interceptor client abandons the circuit or stops waiting for the external preimage.
  25. Later, before the actual on-chain refund timeout, the preimage becomes available outside lnd.
  26. Because the client abandoned the circuit, it does not send SETTLE to Bob's lnd.
  27. Bob's contractcourt keeps waiting for the preimage in the witness beacon.
  28. Bob's lnd never learns the external preimage.
  29. The on-chain HTLC reaches timeout.
  30. Alice or the counterparty takes the timeout path.
  31. Bob misses the on-chain success claim that would have been valid if the interceptor had sent SETTLE.

Severity: blocking. This is in the PR's target no-restart path. The internal held entry is fixed, but the live interceptor client is left with stale off-chain state. For a pure held-before-forward router this may be a failed payment/opportunity loss, but for an interceptor-backed service with external accounting, or a case where Bob already paid/settled downstream, this can become direct funds loss because Bob fails to claim upstream.

This issue is introduced by this PR because it changes the internal held entry from off-chain to on-chain, but intentionally suppresses the duplicate notification for already-held circuit keys. That duplicate notification carries a real state transition.

Suggested fix: re-send the on-chain packet on promotion. The proto docs already say repeated circuit keys must be handled idempotently.

func (s *InterceptableSwitch) interceptOnChain(fwd InterceptedForward) error {
	if err := s.heldHtlcSet.addOnChain(fwd); err != nil {
		return err
	}

	if s.interceptor != nil {
		s.sendForward(fwd)
	}

	return nil
}

The unit test should use different off-chain and on-chain deadlines and assert that the promoted on-chain packet is delivered:

offChain := newMockInterceptedForward(key, 80)
onChain := newMockOnChainInterceptedForward(key, 100)

require.NoError(t, s.heldHtlcSet.addOffChain(offChain))
require.NoError(t, s.interceptOnChain(onChain))

require.Len(t, intercepted, 1)
require.Equal(t, key, intercepted[0].IncomingCircuit)
require.Equal(t, int32(100), intercepted[0].AutoFailHeight())

@ziggie1984

Copy link
Copy Markdown
Collaborator Author

I think the promotion path should notify the connected interceptor with the updated on-chain intercept.

So the idea was to not change behavior to much from the client perspective in case they become confused why they get the same fwd package again, and would basically discard it, I think it makes sense to maybe add this with the RPC change which introduces the deadline timeout as requested here: #10921, but I can of course change it, just let me know what your final opinions are.

@gijswijs gijswijs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this PR a lot. It creates a clean separation of off-chain and on-chain interceptor handling. The held_htlc_set.go now reads much cleaner. I did find some small nits and issues that I've commented on in-line.

Comment thread itest/list_on_test.go
Comment thread htlcswitch/held_htlc_set.go Outdated
Comment thread htlcswitch/held_htlc_set.go Outdated
Comment thread htlcswitch/held_htlc_set.go Outdated
Comment thread htlcswitch/interceptable_switch.go
Comment thread htlcswitch/interceptable_switch.go Outdated
Comment thread witness_beacon.go
Add coverage for held forwards that move on chain after the
incoming channel force closes.

The restart case exercises the path where Bob loses the in-memory
held set and contractcourt re-offers the HTLC through the witness
beacon. The no-restart case keeps the original off-chain hold and
proves that settlement must still reach the on-chain resolver.
@ziggie1984 ziggie1984 force-pushed the fix-onchain-interceptor-held-htlc branch 2 times, most recently from 0836c8d to b689fe6 Compare June 26, 2026 15:18
Store held forwards as off-chain or on-chain entries instead of a raw
InterceptedForward map. Off-chain entries keep the existing resume, fail,
settle and auto-fail behavior. On-chain entries are settle-only and
expire by pruning local interceptor state.

When contractcourt re-offers a circuit that is already held off-chain,
replace the stored entry with the on-chain forward so a later SETTLE
reaches the witness beacon instead of the old link mailbox path.

Also set the on-chain interceptor deadline to the HTLC refund timeout.
This keeps the public interceptor deadline populated while ensuring only
off-chain held entries use that value to fail back.

Only off-chain held HTLCs can be released when an optional interceptor
disconnects, because they can resume into the link forwarding flow.

On-chain held HTLCs have no link flow to resume. Keep them in the held
set so a reconnecting interceptor can replay and settle them while
contractcourt waits for the preimage or on-chain expiry.

Use distinct internal deadline types for off-chain auto-fail heights and
on-chain settlement deadlines instead of overloading the intercepted packet
field.

Project both variants back into the existing router RPC auto_fail_height
field to preserve wire compatibility. Reject mismatched held HTLC deadline
types in tests.

On-chain intercepted HTLCs can only be settled. Resume and fail actions
already return concrete errors through the on-chain intercepted forward, so
let those errors propagate to the interceptor client instead of converting
them to success.

Keep the held entry tracked on these errors so the client can reconnect and
settle the HTLC later.
Release the preimage beacon lock before invoking the on-chain
interceptor. The interceptor path can block on the htlcswitch event
loop, while resolution of another held on-chain HTLC can call back
into the beacon to add a preimage.

If interceptor delivery fails after the subscriber was registered,
cancel the subscription before returning the error.

On-chain held entries are replay handles for the interceptor while
contractcourt waits for a preimage or on-chain expiry. Once the resolver
tears down, keeping the handle until the refund timeout can replay a stale
HTLC to a reconnecting interceptor.

Thread a dedicated cleanup signal from the witness subscription cancel path
back through the interceptable switch event loop. The held set only removes
on-chain entries for that signal, leaving off-chain entries under the link
flow lifecycle.
routerrpc: document on-chain interceptor responses
@ziggie1984 ziggie1984 force-pushed the fix-onchain-interceptor-held-htlc branch from b689fe6 to 9c5f32a Compare June 26, 2026 15:23
@ziggie1984

Copy link
Copy Markdown
Collaborator Author

I think the promotion path should notify the connected interceptor with the updated on-chain intercept.

Updated it but slightly more conservative:

  • fresh on-chain hold => notify
  • off-chain -> on-chain promotion => notify

But it does not send for:

  • duplicate already-on-chain offer

@ziggie1984 ziggie1984 requested a review from gijswijs June 26, 2026 15:26

@gijswijs gijswijs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. All my comments are resolved.

LGTM! 🛡️

@ziggie1984 ziggie1984 requested a review from yyforyongyu June 26, 2026 15:36
@yyforyongyu

Copy link
Copy Markdown
Member

Follow-up/non-blocking robustness note: CancelSubscription now synchronously calls back into the interceptable switch via RemoveOnChainIntercept. I don't think this should block the critical fix in this PR, but it creates a liveness dependency worth tracking or addressing in a follow-up.

Step-by-step scenario:

  1. Bob has one or more on-chain held HTLCs in heldHtlcSet.
  2. An interceptor client connects or reconnects to Router.HtlcInterceptor.
  3. The switch event loop receives the interceptor registration.
  4. setInterceptor replays currently held HTLCs synchronously from the switch event-loop goroutine.
  5. Replay calls s.heldHtlcSet.forEach(s.sendForward).
  6. sendForward calls the registered routerrpc interceptor callback synchronously.
  7. The routerrpc callback calls stream.Send(interceptionRequest).
  8. The interceptor client is operator-controlled/authenticated, but it can still stall, stop reading, deadlock, or be paused.
  9. Once the gRPC/HTTP2 send window is exhausted, stream.Send blocks waiting for the client to read.
  10. The single interceptable switch event-loop goroutine is now blocked inside interceptor delivery.
  11. An on-chain resolver finishes, times out, or otherwise tears down its witness subscription.
  12. The resolver runs preimageSubscription.CancelSubscription().
  13. CancelSubscription removes the witness subscriber, then calls p.cancelInterceptor(inKey) synchronously.
  14. p.cancelInterceptor is s.interceptableSwitch.RemoveOnChainIntercept.
  15. RemoveOnChainIntercept sends the circuit key on the unbuffered onchainInterceptDone channel.
  16. That send requires the switch event loop to receive from onchainInterceptDone.
  17. The switch event loop cannot receive because it is still blocked in stream.Send.
  18. Resolver cleanup is now blocked behind progress of the external interceptor stream.

Consequence: resolver cleanup can be delayed, stale on-chain held entries can remain in memory longer than intended, and a resolver goroutine can remain blocked until the interceptor stream unblocks or the switch shuts down. This is not a remote Lightning peer exploit; it is a robustness issue with an operator-controlled interceptor service.

Severity: medium/low follow-up. The PR fixes the funds-loss path, so I would not block on this, but the cleanup path should ideally not synchronously depend on the same switch loop that performs blocking interceptor delivery.

Minimal follow-up fix shape: decouple resolver cleanup from the synchronous switch-loop rendezvous, for example by making the cancellation callback enqueue cleanup asynchronously:

CancelSubscription: func() {
	p.Lock()

	delete(p.subscribers, clientID)

	close(client.quit)
	p.Unlock()

	go func() {
		err := p.cancelInterceptor(inKey)
		if err != nil {
			srvrLog.Errorf("Cannot remove on-chain "+
				"intercept %v: %v", inKey, err)
		}
	}()
},

A stronger design would decouple stream.Send from the switch event loop or add a dedicated async cleanup queue with explicit shutdown semantics.

@ziggie1984 ziggie1984 merged commit cbe7774 into lightningnetwork:master Jun 26, 2026
44 of 81 checks passed
@github-project-automation github-project-automation Bot moved this from In review to Done in v0.21 Jun 26, 2026
@github-actions

Copy link
Copy Markdown

Created backport PR for v0.20.x-branch:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-10895-to-v0.20.x-branch
git worktree add --checkout .worktree/backport-10895-to-v0.20.x-branch backport-10895-to-v0.20.x-branch
cd .worktree/backport-10895-to-v0.20.x-branch
git reset --hard HEAD^
git cherry-pick -x 9b31ba83ef52e2d2800f53564f44b9efd4a75ca3 eb1193f80bbe5f0ec2f6618657f1deada5f6561c 98da7b4a56a75ba2dbf47391d43229fd9696e98a 8909c2fbf5535b81ee18c4f4e7fe0160ca43e725 9c5f32a2ec8eab0760a46c3d0357d4127794425f
git push --force-with-lease

@github-actions

Copy link
Copy Markdown

ziggie1984 added a commit that referenced this pull request Jun 26, 2026
…21.x-branch

[v0.21.x-branch] Backport #10895: htlcswitch: separate onchain and offchain intercpeted HTLCs cleanly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-v0.20.x-branch This label is used to trigger the creation of a backport PR to the branch `v0.20.x-branch`. backport-v0.21.x-branch This label triggers a backport to branch `v0.21.x-branch ` bug fix gateway-active severity-critical Requires expert review - security/consensus critical

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[bug]: on-chain intercepts evicted after one block due to unset AutoFailHeight

5 participants